[   i ,   s u s h i   -   a n   e x t e n s i o n   o f   s e q u i t u r   ]

Given k features, k-1 edges would exist in a linear corpus as opposed to k2-k if it were unordered.

A corpus would have to contain multiple training sets since most information modeled as unordered sets don’t typically have item repetition.

A collection of open ports on a host will never contain the same port twice, nor will the order in which the ports are listed make a difference.

The digram ‘TH’ may occur many times in a single sentence and the order of ‘T’ and ‘H’ in the digram matters, as “HT” is distinctly different.

Given a collection of sets, recursively replace the most common two-feature pair across all sets with a new token and an expansion rule. Once no two-feature pair occurs more than once, stop. For any dictionary symbol occurring only once across all expansion rules and main corpus, expand the symbol and delete the expansion rule.